Bridging Corpus for Russian in comparison with Czech

نویسندگان

  • Anna Roitberg
  • Anna Nedoluzhko
چکیده

In this paper, we present a syntactic approach to the annotation of bridging relations, socalled genitive bridging. We introduce the RuGenBridge corpus for Russian annotated with genitive bridging and compare it to the semantic approach that was applied in the Prague Dependency Treebank for Czech. We discuss some special aspects of bridging resolution for Russian and specifics of bridging annotation for languages where definite nominal groups are not as frequent as e.g. in Romance and Germanic languages. To verify the consistency of our method, we carry out two comparative experiments: the annotation of a small portion of our corpus with bridging relations according to both approaches and finding for all relations from the RuGenBridge their semantic interpretation that would be annotated for Czech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Portable Language Technology: Russian via Czech

We report on morphological tagging of Russian using very limited Russian resources. We train the TnT tagger (Brants, 2000) on a modified Czech corpus to get the transition probabilities. We believe that the two languages are similar enough for the transitional information to be useful. The Russian emission symbols are obtained using a morphological analyzer that does not rely on a manually crea...

متن کامل

A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources

In this paper, we describe a resource-light system for the automatic morphological analysis and tagging of Russian. We eschew the use of extensive resources (particularly, large annotated corpora and lexicons), exploiting instead (i) pre-existing annotated corpora of Czech; (ii) an unannotated corpus of Russian. We show that our approach has benefits, and present what we believe to be one of th...

متن کامل

Statistical Machine Translation Between Related and Unrelated Languages

In this paper we describe an attempt to compare how relatedness of languages can influence the performance of statistical machine translation (SMT). We apply the Moses toolkit on the Czech-English-Russian corpus UMC 0.1 in order to train two translation systems: Russian-Czech and English-Czech. The quality of the translation is evaluated on an independent test set of 1000 sentences parallel in ...

متن کامل

Experiments in Cross-Language Morphological Annotation Transfer

Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is...

متن کامل

Corpus Analysis for Lexical Database Construction: A Case of Russian and Czech Wordnets

The paper deals with corpus-based methods applied to the particular tasks of lexical database construction. Different techniques of the corpus analysis are discussed and their applicability for the tasks is assessed. Corpus management system Manatee + Bonito developed at the Faculty of Informatics, Masaryk University in Brno, Czech Republic, is presented as a tool that enables to perform all di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016